30 research outputs found
Backdoors in Neural Models of Source Code
Deep neural networks are vulnerable to a range of adversaries. A particularly
pernicious class of vulnerabilities are backdoors, where model predictions
diverge in the presence of subtle triggers in inputs. An attacker can implant a
backdoor by poisoning the training data to yield a desired target prediction on
triggered inputs. We study backdoors in the context of deep-learning for source
code. (1) We define a range of backdoor classes for source-code tasks and show
how to poison a dataset to install such backdoors. (2) We adapt and improve
recent algorithms from robust statistics for our setting, showing that
backdoors leave a spectral signature in the learned representation of source
code, thus enabling detection of poisoned data. (3) We conduct a thorough
evaluation on different architectures and languages, showing the ease of
injecting backdoors and our ability to eliminate them
PECAN: A Deterministic Certified Defense Against Backdoor Attacks
Neural networks are vulnerable to backdoor poisoning attacks, where the
attackers maliciously poison the training set and insert triggers into the test
input to change the prediction of the victim model. Existing defenses for
backdoor attacks either provide no formal guarantees or come with
expensive-to-compute and ineffective probabilistic guarantees. We present
PECAN, an efficient and certified approach for defending against backdoor
attacks. The key insight powering PECAN is to apply off-the-shelf test-time
evasion certification techniques on a set of neural networks trained on
disjoint partitions of the data. We evaluate PECAN on image classification and
malware detection datasets. Our results demonstrate that PECAN can (1)
significantly outperform the state-of-the-art certified backdoor defense, both
in defense strength and efficiency, and (2) on real back-door attacks, PECAN
can reduce attack success rate by order of magnitude when compared to a range
of baselines from the literature
Distribution Policies for Datalog
Modern data management systems extensively use parallelism to speed up query processing over massive volumes of data. This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation. In this paper, we go beyond joins and study the parallel evaluation of recursive queries. We introduce a novel framework to reason about multi-round evaluation of Datalog programs, which combines implicit predicate restriction with distribution policies to allow expressing a combination of data-parallel and query-parallel evaluation strategies. Using our framework, we reason about key properties of distributed Datalog evaluation, including parallel-correctness of the evaluation strategy, disjointness of the computation effort, and bounds on the number of communication rounds